The data explored for this lab is spawner and recruit Kvichak Sockeye data, which is a river system in Bristol Bay, Alaska. This data set includes data from 1952-2005. There is missing spawner data from 1952-1995, and missing recruit data from 1999-2005, with a valid window from 1966-1998.
## library(devtools)
## Windows users will likely need to set this
## Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")
#devtools::install_github("nwfsc-timeseries/atsalibrary") already installed
## get data
data(KvichakSockeye, package="atsalibrary")
SR_data <- KvichakSockeye
head(SR_data)
## # A tibble: 6 × 5
## # Groups: brood_year [6]
## brood_year spawners recruits pdo_summer_t2 pdo_winter_t2
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1952 NA 20200 -2.79 -1.68
## 2 1953 NA 593 -1.2 -1.05
## 3 1954 NA 799 -1.85 -1.25
## 4 1955 NA 1500 -0.6 -0.68
## 5 1956 9440 39000 -0.5 -0.31
## 6 1957 2840 4090 -2.36 -1.78
The data are a dataframe with columns for brood year
(brood_year), number of spawners (spawners),
number of recruits (recruits) and PDO at year \(t-2\) in summer
(pdo_summer_t2) and in winter
(pdo_winter_t2).
par(mfrow=c(2,2))
plot(SR_data$brood_year, SR_data$spawners, type= "l", xlab = "Year", ylab = "Spawners")
plot(SR_data$brood_year, SR_data$recruits, type= "l", xlab = "Year", ylab = "Recruits")
plot(SR_data$brood_year, log(SR_data$recruits/SR_data$spawners) , xlab = "Year", ylab = "log(R/S)", type= "l")
plot(SR_data$spawners, SR_data$recruits, xlab = "Spawners", ylab = "Recruits", pch = 16)
par(mfrow=c(2,1))
plot(SR_data$brood_year, SR_data$pdo_summer_t2, type= "l", xlab = "Year", ylab = "Summer PDO")
abline(h=0, lty = 2)
plot(SR_data$brood_year, SR_data$pdo_winter_t2, type= "l", xlab = "Year", ylab = "Winter PDO")
abline(h=0, lty= 2)
## shrink the window of years so the response variable will have a full set of values
SR_dat_short <- SR_data[SR_data[,"brood_year"] >= 1956 & SR_data[,"brood_year"] <= 1998, ]
#Create a column for the response variable
SR_dat_short$Yt <- log(SR_dat_short$recruits/SR_dat_short$spawners) #Yt represents log of recruits over spawners.
#Create the observation matrix
Yt <- t(SR_dat_short$Yt)
#determines number of elements
TT <- nrow(SR_dat_short)
Use the information and data in the previous section to answer the
following questions. Note that if any model is not converging, then you
will need to increase the maxit parameter in the
control argument/list that gets passed to
MARSS(). For example, you might try
control=list(maxit=2000).
This model assumes no density-dependent survival in that the number of recruits is an ascending function of spawners. Plot the ts of \(\alpha_t\) and note the AICc for this model. Also plot appropriate model diagnostics.
#create the model params for reduced Ricker's model
Z <- array(NA,c(1,1,TT))
Z[1,1,] <- c(1)
modlist <- list( # All other arguaments are left as default
Z = Z,
U = "zero",
A = "zero",
B = "identity"
)
reduced_Rick <- MARSS(Yt, model = modlist)
## Success! abstol and log-log tests passed at 24 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 24 iterations.
## Log-likelihood: -51.87179
## AIC: 109.7436 AICc: 110.359
##
## Estimate
## R.R 0.211
## Q.Q 0.303
## x0.x0 1.002
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Results: The AICc of this model is 110.36, state plots and residuals are below
autoplot(reduced_Rick, silent = TRUE)
Notes: The observations and level follow each other exactly. This makes sense considering this is the only variable in the model. The model residuals don’t seem to follow a trend, but they are absolutely not normally distributed. The state residuals have no time trend, and the model smoothation results seem to be normally distributed. The ACF plot has significant autocorrelation at lags of 5 and 10 years
#create the model params for the full Ricker's model
Z <- array(NA,c(1,2,TT)) # the array has 2 elements now
Z[1,1,] <- c(1)
Z[1,2,] <- SR_dat_short$spawners
modlist <- list( # All other arguments are left as default
Z = Z,
U = "zero",
A = "zero",
B = "identity",
Q = ldiag(list("q1",0)) # creates a custom Q matrix with zero variance for the spawner effect... I think
)
full_Rick <- MARSS(Yt, model = modlist, inits = list(x0 = matrix(c(0),2,1))) # added inits list due to marss error
## Success! abstol and log-log tests passed at 26 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 26 iterations.
## Log-likelihood: -51.80921
## AIC: 111.6184 AICc: 112.6711
##
## Estimate
## R.R 0.217557
## Q.q1 0.293165
## x0.X1 0.936822
## x0.X2 0.000007
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Result: AICC is 112.67, higher than the previous model after multiple attempts at different x0 values. each one resulted in the same model. The increase in AICc is likely driven by an increase in parameters by allowing density dependency and time variation in the intercepts.
autoplot.marssMLE(full_Rick, silent = TRUE)
Results: Model fit, residuals, and ACF all look the same as the previous model because all the variation was loaded onto the intercept term.
pdo_summer_t2). What is the mean level of
productivity? Plot the ts of \(\delta_t\) and note the AICc for this
model. Also plot appropriate model diagnostics.#create the model params for the full Ricker's model
Z <- array(NA,c(1,3,TT)) # the array has 3 elements now
Z[1,1,] <- c(1)
Z[1,2,] <- SR_dat_short$spawners
Z[1,3,] <- SR_dat_short$pdo_summer_t2 # Covariate data is added to the Z matrix
# The model states are the covariate effects in a DLM
modlist <- list( # All other arguments are left as default
Z = Z,
U = "zero",
A = "zero",
B = "identity",
Q = ldiag(list("q1",0,"q3")) # creates a custom Q matrix with zero variance for the spawner effect... I think? Additionally, the intercept variance may need to be contrained further
)
summer_Rick <- MARSS(Yt, model = modlist, control = list(maxit=1000), inits = list(x0 = matrix(c(1,1,1),3,1))) # added inits list due to marss error
## Success! abstol and log-log tests passed at 742 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 742 iterations.
## Log-likelihood: -51.78324
## AIC: 115.5665 AICc: 117.8998
##
## Estimate
## R.R 2.00e-01
## Q.q1 3.16e-01
## Q.q3 0.00e+00
## x0.X1 1.01e+00
## x0.X2 5.49e-06
## x0.X3 2.85e-02
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Result: AICC is 117.89, higher than the previous models and all the variance is loaded onto the intercept again. I have tried to limit the variation of the intercept but I get a MARSS error every time. I have also tried to change the initial values but the resulting model was the same.
autoplot(summer_Rick, silent = TRUE)
pdo_winter_t2). What is the mean level of
productivity? Plot the ts of \(\delta_t\) and note the AICc for this
model. Also plot appropriate model diagnostics.#create the model params for the full Ricker's model
Z <- array(NA,c(1,3,TT)) # the array has 3 elements now
Z[1,1,] <- c(1)
Z[1,2,] <- SR_dat_short$spawners
Z[1,3,] <- SR_dat_short$pdo_winter_t2 # Covariate data is added to the Z matrix
# The model states are the covariate effects in a DLM
modlist <- list( # All other arguments are left as default
Z = Z,
U = "zero",
A = "zero",
B = "identity",
Q = ldiag(list("q1",0,"q3")) # creates a custom Q matrix with zero variance for the spawner effect... I think? Additionally, the intercept variance may need to be contrained further
)
winter_Rick <- MARSS(Yt, model = modlist, control = list(maxit=3000), inits = list(x0 = matrix(c(1,0,0),3,1))) # added inits list due to marss error
## Success! abstol and log-log tests passed at 1358 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 1358 iterations.
## Log-likelihood: -51.38573
## AIC: 114.7715 AICc: 117.1048
##
## Estimate
## R.R 1.96e-01
## Q.q1 3.11e-01
## Q.q3 0.00e+00
## x0.X1 1.13e+00
## x0.X2 1.32e-06
## x0.X3 1.54e-01
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
Result: AICC is 117.1, higher than the previous models and all the variance is loaded onto the intercept again. I have tried to limit the variation of the intercept but I get a MARSS error every time. I have also tried to change the initial values bu the resulting model was the same.
autoplot(winter_Rick, silent = TRUE)
Why is all the variation loaded onto the intercept again? Is this just a feature of the model? The winter effect is nonzero but still pretty small compared.
#Summary Table
#Someone want to check I pulled the correct things for meanprod and SE?
mods <- c("Reduced Ricker","Full Ricker","Summer PDO","Winter PDO")
NLL <- c(reduced_Rick$logLik, full_Rick$logLik, summer_Rick$logLik, winter_Rick$logLik)
meanProd<-c(reduced_Rick$states[1,1] ,full_Rick$states[2,1],summer_Rick$states[3,1],winter_Rick$states[3,1])
SEs<-c(reduced_Rick$states.se[1,1] ,full_Rick$states.se[2,1],summer_Rick$states.se[3,1],winter_Rick$states.se[3,1])
tab <- cbind.data.frame(mods, NLL, meanProd, SEs)
is.num <- sapply(tab, is.numeric)
tab[is.num] <- lapply(tab[is.num], round, 3)
kable(tab, col.names = c("Models", "Likelihood ", " Means ", "- SE "))
| Models | Likelihood | Means |
|
|---|---|---|---|
| Reduced Ricker | -51.872 | 1.004 | 0.312 |
| Full Ricker | -51.809 | 0.000 | 0.000 |
| Summer PDO | -51.783 | 0.028 | 0.000 |
| Winter PDO | -51.386 | 0.154 | 0.000 |
#AICc Table
mods <- c("Reduced Ricker","TimeVar Intercept","Summer PDO","Winter PDO")
aic <- c(reduced_Rick$AICc, full_Rick$AICc, summer_Rick$AICc, winter_Rick$AICc)
daic <- aic-min(aic)
tab <- cbind.data.frame(mods, aic, daic)
kable(tab, col.names = c("Models", "AICc", "delta AICc"))
| Models | AICc | delta AICc |
|---|---|---|
| Reduced Ricker | 110.3590 | 0.000000 |
| TimeVar Intercept | 112.6711 | 2.312095 |
| Summer PDO | 117.8998 | 7.540854 |
| Winter PDO | 117.1048 | 6.745832 |
According to AICc, the most parsimonious model was the Reduced Ricker, which was essentially a random walk model. While adding complexity with the other models did decrease the negative log likelihoods, the improved fits to data were quite small, and adding parameters didn’t improve fit enough to justify using more complex models.
We used the best fitting model to do the forecast, which was the Reduced Ricker.
Forecasting from a DLM involves two steps:
Get an estimate of the regression parameters at time \(t\) from data up to time \(t-1\). These are also called the one-step ahead forecast (or prediction) of the regression parameters.
Make a prediction of \(y\) at time \(t\) based on the predictor variables at time \(t\) and the estimate of the regression parameters at time \(t\) (step 1). This is also called the one-step ahead forecast (or prediction) of the observation.
## shrink the window of years so the response variable will have a full set of values
SR_dat_short2 <- SR_data[SR_data[,"brood_year"] >= 1956 & SR_data[,"brood_year"] <= 2005, ]
#Create a column for the response variable
SR_dat_short2$Yt <- log(SR_dat_short2$recruits/SR_dat_short2$spawners) #Yt represents log of recruits over spawners.
#Create the observation matrix
Yt <- t(SR_dat_short2$Yt)
#determines number of elements
TT <- nrow(SR_dat_short2)
# our top model from above: Reduced Ricker
#create the model params
Z <- array(NA,c(1,1,TT))
Z[1,1,] <- c(1)
modlist <- list( # All other arguaments are left as default
Z = Z,
U = "zero",
A = "zero",
B = "identity"
)
#Run again for good measure
reduced_Rick <- MARSS(Yt, model = modlist)
## Success! abstol and log-log tests passed at 26 iterations.
## Alert: conv.test.slope.tol is 0.5.
## Test with smaller values (<0.1) to ensure convergence.
##
## MARSS fit is
## Estimation method: kem
## Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
## Estimation converged in 26 iterations.
## Log-likelihood: -51.87353
## AIC: 109.7471 AICc: 110.3625
##
## Estimate
## R.R 0.213
## Q.Q 0.301
## x0.x0 0.998
## Initial states (x0) defined at t=0
##
## Standard errors have not been calculated.
## Use MARSSparamCIs to compute CIs and bias estimates.
## get list of Kalman filter output
kf_out <- MARSSkfss(reduced_Rick)
## forecasts of regr parameters; 2xT matrix
eta <- kf_out$xtt1
## ts of E(forecasts)
fore_mean <- vector()
for (t in 1:TT) {
fore_mean[t] <- Z[, , t] %*% eta[, t, drop = FALSE]
}
plot(fore_mean)
plot(SR_dat_short2$Yt)
plot(x=SR_dat_short2$brood_year , y= fore_mean, type = "b", pch = 19,
col = "red", xlab = "Year", ylab = "log(R/S)", ylim = c(-2, 3), xlim = c(1956, 1998))
lines(SR_dat_short2$brood_year, SR_dat_short2$Yt, pch = 18, col = "blue", type = "b", lty = 2)
legend("topright", legend=c("data", "forecast"),
col=c("red", "blue"), lty = 1:2, cex=0.8)
The one step ahead forcast follows the trend of the data, but is lagged
behind the data by one year. ##
# compute the forecast variance:
## variance of regr parameters; 1x2xT array
Phi <- kf_out$Vtt1
## obs variance; 1x1 matrix
R_est <- coef(reduced_Rick, type = "matrix")$R
## ts of Var(forecasts)
fore_var <- vector()
m<-1 #number of regression pars
for (t in 1:TT) {
tZ <- matrix(Z[, , t], m, 1) ## transpose of Z
fore_var[t] <- Z[, ,t] %*% Phi[, , t] %*% tZ + R_est
}
plot(x = SR_dat_short2$brood_year, y=fore_mean, type = "l", lwd = 2, xlab = "Year", ylab = "Log(R/S)", ylim = c(-3,3))
lines(x = SR_dat_short2$brood_year, y = fore_mean-2*fore_var, lty = 2)
lines(x = SR_dat_short2$brood_year, y = fore_mean + 2*fore_var, lty=2)
points(x =SR_dat_short2$brood_year, y = SR_dat_short2$Yt)
This model fit the data reasonably well, with the model following the
trend of the data, though with a one year lag, and observations fall
within the margin of error. The prediction flattens out at the end of
the time series, which makes sense given this is a random walk model
that is informed by the last year of data.
## forecast errors
innov <- kf_out$Innov
## Q-Q plot of innovations
qqnorm(t(innov), main = "", pch = 16, col = "blue")
## add y=x line for easier interpretation
qqline(t(innov))
There are some odd trends in this QQ plot and to be honest, I’m not sure
the best way to interpret it.
The Kvichak River system flows into Bristol Bay Alaska, and some other possible environmental factors that could influence time-varying productivity could include sea ice dynamics, driving the underlying primary productivity and food availability of plankton for salmon that have migrated to the Bering Sea, which could have an impact on fecundity and survival. Sea ice extent is correlated with certain species of copepods, with varying lipid content, and some species may be better food for salmon. Sea Ice would likely be correlated with sea surface temperatures, but SST could be another covariate to explore as the Bering Sea has shown evidence of warming in recent decades.
Ultimately this lab was a good lesson in covariates not improving model fits enough to justify more complex models. While we can speculate and make inference on environmental impacts on factors, such as cycles such as PDO on spawner recruit relationships, sometimes the most simple models explain enough of the relationship where additional complexity isn’t necessary. In this lab, the model with the lowest AICc was essentially a random walk process, and while considering density dependency and environmental processes such as summer and winter PDO did ever so slightly improve the negative log likelihoods, the fit gained from the additional parameters wasn’t worth it.
Eric completed the data cleaning and set up the primary four models. Madi added plots and created the NLL and AICc tables and both Karl and Madi tackled the forecast portion of the lab. Discussion and notes in the document were added by Madi and Eric and Karl proofread and edited where appropriate.